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Abstract 

This paper describes how techniques from Artificial 
Life can be used to evolve a population of personalized 
information filtering agents. The technique of artifi- 
cial evolution and the technique of learning from feed* 
back are combined to develop a semi-automated infor- 
mation filtering system which dynamically adapts to 
the changing interests of the user. We present results 
of a set of experiments in which a small population of 
information filtering agents was evolved to make a per- 
sonalized selection of USENET netnews messages for 
a particular user. The results show that the artificial 
evolution component of the system is responsible for 
improving the recall rate of the selected set of articles, 
while learning from feedback component improves the 
precision rate. 

1 Introduction 

One of the main problems in building a system for 
personalized information filtering is the construction 
of a profile of the user's information interests. There 
are three subproblems involved. The first is finding 
a representation for the user profile that allows both 
power and flexibility. Second, it is important that the 
user be able to communicate her desires and interests 
to the system so that an initial profile can be con- 
structed. Finally, the system has to be responsive and 
change this initial profile as the interests of the user 
change over time. 

This paper proposes to use techniques from the field 
of Artificial Life to build a personalized information 
filtering system. Artificial evolution - often imple- 
mented as a genetic algorithm - has proven to be an 
effective parallel search technique in a number of prob- 
lem domains [2, 7, 8, 91. It has also been shown that 
combining artificial evolution with individual learning 
by the evolved organisms speeds up the search process 
significantly [1, 10]. This combination of techniques is 
also particularly useful in situations where the opti- 
mal solution keeps changing over time. This property 
makes them attractive as a technique for searching the 
space of user profiles in an adaptive information filter- 
ing system. 

The first section of this paper discusses the Informa- 
tion Filtering problem and presents a short overview 
of previous work in the field. We then show how a 
genetic algorithm combined with individual learning 
can be used for the search of a user profile. 



Examples are presented from an implemented pro- 
totype which filters news articles from the USENET 
newsgroups. Experimental results demonstrate how 
the different mechanisms of the system relate to per- 
formance evaluation parameters. In particular, the re- 
sults show that the technique of genetic variation is re- 
sponsible for improving the recall of the set of articles 
retrieved, while the technique of learning from feed- 
back is responsible for improving its precision. The 
last section presents some concluding remarks along 
with a discussion of future research. 

2 Information Filtering 

Information filtering has been used to describe a 
variety of processes involving the delivery of informa- 
tion to users. While information filtering is related 
to processes such as retrieval, routing, categorization, 
and extraction, the distinction needs to be made clear 
so as to focus on the specific research issues associated 
with filtering [3]. 

Information filters are mediators between sources 
of information and their end- users. Filtering applica- 
tions typically involve streams of incoming data, either 
being broadcast by remote sources or sent directly by 
other sources. These data may also be the result of 
database searches. Information filtering is typically 
concerned with repeated uses of the system, by a per- 
son or persons with long-term goals or interests, unlike 
a typical information retrieval system. 

Filtering mainly deals with a dynamic data stream, 
as opposed to a static database, from which texts are 
selected or eliminated. This also has a bearing on the 
performance evaluation criteria to be used for a fil- 
tering system. The user's mode of interaction with a 
filtering system is fairly different from other informa- 
tion gathering systems. Instead of responding to user 
interaction in a single information-seeking episode, a 
filtering system has to deal with long-term changes 
over a series of information-seeking episodes. Informa- 
tion filters are more likely to be personalised to serve 
the same user's need over a relatively long period of 
time. Learning and adaptation are, therefore, issues 
of prime importance to filtering systems [3]. 

Some of the research carried out in information re- 
trieval is directly relevant to information filtering sys- 
tems. Especially, work done in the areas of text repre- 
sentation, retrieval techniques, and user modelling can 
be leveraged to design better filtering systems. Con- 
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ventional text representation schemes commonly use 
indexing methods, while more sophisticated schemes 
use clustering, boolean probabilistic models or vector 
spaces to represent texts [11]. Retrieval techniques 
are concerned with estimating the "score" of an object 
to be retrieved. Research in user modelling has been 
mainly focussed on query formulation and relevance 
feedback as mechanisms for the system to acquire in- 
formation about the goals of the user. Performance of 
retrieval systems is shown to be significantly improved 
by using simple relevance feedback techniques [12]. 

A number of different approaches have been used 
to automate information filtering. Rule based systems 
which observe user's usage patterns and make sugges- 
tions based on them have been described in [4]. Rules 
are used to measure usage patterns such as commonly 
occuring terms, as well as timeliness measurements 
like frequency and recency. This helps in bringing us- 
age patterns to the attention of the users. Statistical 
methods have been useful in improving filtering meth- 
ods. [5] presents results of an experiment aimed at 
determining the effectiveness of four statistical infor- 
mation filtering methods in the domain of technical 
reports. A novel mechanism for collaborative filtering 
in which users annotate documents is presented in [6j. 
When new documents arrive, "eager readers" anno- 
tate the documents, while "casual readers" can install 
filters which use these annotations in addition to the 
content of the document. 

One of the desirable features in an information fil- 
tering system is that they recommend new information 
not already in the profile, which might possibly be of 
interest to the user. A rule based system which looks 
for usage patterns can only comment upon what the 
user is already doing, not change it. One of the ad- 
vantages of the artificial evolution approach described 
in this paper is its exploratory behavior. By muta- 
tions and crossovers of fit information filtering agents, 
the system can explore newer domains which may be 
of potential interest to the user. Another desirable 
feature is that the filtering system should be able to 
unlearn previously learned knowledge when the users 
interests change. A statistical profile builder might 
build a good user profile, but then there is a high in- 
ertia towards unlearning when necessary. In artificial 
evolution, agents have to continually gain fitness in 
succeeding generations, else they are eliminated from 
the population. This means that an agent which had 
a high fitness value in the last generation might not be 
able to survive to the next, if it does not gain fitness 
in the present generation. This enables the system to 
be dynamically adaptive to the user's interests. 

3 The Algorithm 

The problem of building a personalized Information 
Filtering system can be viewed as a search process. It 
involves searching over the large and complex space 
of possible user profiles, for an "optimal" user profile 
(or a set of profiles) that match the user's different 
interests. This "optimal" user profile has to vary as 
the user's interests change over time. 

Evolution can be viewed as search in a space of 
genotypes for the ones that are the fittest (or the best 



adapted) to survive in the environment. Cycles of 
genetic variation followed by selection of the fittest 
produce a relatively fitter species with every genera- 
tion. Genetic Algorithms extract and generalize crit- 
ical processes of evolution and use them to solve ar- 
tificial search problems [9]. They have proven very 
successful in searching for global optima in large and 
complex search spaces. 

Searching a large and changing space involves a 
trade-off between two objectives: (i) exploiting the 
currently available solution and (it) further exploring 
the search space for a possibly better solution. Hill 
Climbing is an example of a search technique which 
exploits the best known alternative. However, because 
of this very reason, it is likely to get stuck in local max- 
ima. Random Search, on the other hand, is an extreme 
case of an exploring search technique: it is unsatisfac- 
tory as it does not make use of the best solution found 
so far. Genetic Algorithms manage the trade-off be- 
tween exploration and exploitation in a near optimal 
way — they exploit the solution found so far, while 
Crossover and Mutation operations provide a way of 
exploring the search space for better solutions [9]. 

Several experiments have demonstrated that artifi- 
cial evolution is helped by individual learning [1, 101. 
This phenomenon is also known as the "Baldwin ef- 
fect": if the organisms evolved are allowed to learn 
during their lifetime, then the evolution towards a fit- 
ter species happens much faster. This is the case be- 
cause every individual is able to explore a "patch" of 
the search space ( find the maximum fitness in the lo- 
cal neighbourhood of its genotype) rather than a single 
point (evaluate the fitness of its own genotype). 

We have used a genetic algorithm with individual 
learning to build a prototype of a personalized Infor- 
mation Filtering system. Presently, we use USENET 
network news as the data stream from which articles 
are retrieved. 

The system consists of a number of news cate- 
gories which a user has defined l . Each of these 
news categories consists of a population of filtering 
agents. These are "organisms" that retrieve articles 
which match an internal representation of the type 
of article they are interested in. The internal repre- 
sentation consists of whatever the organism inherited 
genetically from its parents (the genotype) augmented 
with information it learns during its lifetime. Agents 
are assigned a fitness value based on the user feed- 
back regarding their performance. The user conveys 
whether an article that was retrieved by one or sev- 
eral agents was appreciated or not. The agents learn 
from this feedback by changing their internal repre- 
sentation to reflect this training example. For each 
positive/negative feedback received, an agent gets pos- 
itive/negative fitness points. To create the next gen- 
eration of agents, only the very fit agents are selected 
to produce offspring. The offspring is produced by ap- 
plying the copy, crossover and mutation operators to 
the fit agents. 

This genetic process driven by user feedback makes 



1 The way in which a user can define a news category is ex- 
plained later. 



Xref: clari.nevw.economy:1925 clari. news. disaster: 948 
From: clarinewsGclarihet.com 
Newsgroups: clari. news. economy. clari. news. disaster 
Subject: Prices decline on world markets despite hurri- 
cane 

Keywords: oil, energy, economy, severe weather, trouble 
Message-ID: <oilpriceU2aP5pc<0clarinet.com> 
Date: 25 Aug 92 22.09:12 GMT 
References: <oilpriceU2l6540peQclarinet.com> 
Lines: 85 

Approved: clarinewsOclarinet.com 

X-Supersedes: <oilpriceU2aP545peQclarinet.com> 

Location: texas 

ACategory: national 

Slugword: oilprice 

Priority: major 

Format: regular 

AN PA: Wc: 868; Id: z6205; Sel: txbyo; Adate: 8-25- 
5pcd; Ver: 38/2 

Codes: ybyortx., yne.rtx., ynbwrtx., xxxxxxxx 

Figure 1: A sample news header of a "richer", more 
structured article. 

the population of filtering agents to evolve towards the 
optimal interest profile of the user. The details of the 
algorithm are as described below. 

3.1 Genotype and Internal Representa- 
tion 

Genotypes are the individual points in the search 
space of user profiles. A sample genotype is shown 
below 2 : 

newsgroup: clari. sports, basketball 

location: boston, chicago, usa 

source: New York Times 

keywords: Celtics, bulls, jordan, magic johnson 



At birth, an agent creates an internal representa- 
tion based on its genotype. As the agent learns during 
its lifetime, changes are made to this internal repre- 
sentation. The internal representation is structured in 
the same way as the genotype described above. This 
way both the genetic algorithm as well as the learn- 
ing from feedback mechanism search the same space of 
user profiles (which is necessary for the Baldwin effect 
to be able to take place). 

The internal representation, when created, has ex- 
actly the same structure and information as the geno- 
type. In addition it maintains weights for all of the 
attributes (such as keywords, source, authors, etc) as 
it learns that some are more relevant than others. The 
initial weights of the attributes axe all small positive 
values. This ensures that, while the offspring inherits 
some attributes from the parents (a parental "bias"), 
attributes learned during the organism's lifetime also 
have a fair chance of proving their relevance. 

7 location" and "source" are record fields provided by the 
Net news database. 



3.2 Learning from Feedback 

When an agent receives positive feedback, it ex- 
tracts information from the corresponding article 
and incorporates it into its internal representation. 
Presently, the agent extracts most of the information 
provided in the header of the news article (Figure 1), 
in particular the author, keywords, location, category 
and priority fields. If, say, a keyword is already present 
in the internal representation, it's weight is increased, 
so that the agent is more likely to retrieve similar ar- 
ticles in the future. Conversely, in the case of nega- 
tive feedback, the information is stored with negative 
weight, so as to make it less likely that similar articles 
will be retrieved in the future. 

The user can also manually indicate preference for 
particular keywords occuring in an article. This can 
be done by highlighting the appropriate words in the 
text of the article. These keywords (with initial small 
positive or negative weight) get added to the internal 
representation of the agent (if they already exist, their 
weights are increased or decreased respectively). 

3.3 Phenotype 

The phenotype is the manifested behavior of the 
agent in its environment. Each agent looks up the 
newsgroup as specified in its internal representation. 
Each article header is rated and assigned a relevance 
value. Relevance points are assigned to the article for 
each point of similarity to the internal representation. 
For example, for a keyword in the subject or keywords 
field of the article that matches one in the keywords 
field of the agent, points proportional to the weight of 
the keyword are assigned. The sum of all these rel- 
evance points determines the overall relevance score 
of the article. The articles with high scores are re- 
trieved, the rest are filtered away. The number of ar- 
ticles retrieved by an agent for display to the user is 
proportional to the agent's fitness. 

3.4 Fitness function 

An agent (or phenotype) is assigned a fitness value 
based on the user feedback received on articles the 
user reads. For every article the user indicates liking 
or disliking 3 , the agent(s) which were responsible for 
retrieving that article get positive or negative fitness 
points respectively. The interface mechanism for the 
user to indicate her preference is described in the fol- 
lowing section. 

3.5 Initial Population 

The initial population of agents is created when the 
user creates a new news category. The user must spec- 
ify the name of the news category and can also give 
additional keywords which will be added to the geno- 
types of the first generation of agents. Suppose the 
user creates the news category sports. The system 
then looks up the list of available newsgroups to find 
those which have sports articles (presently, it is just a 
keyword based search). If the number of these news- 
groups is large enough to form a population (as speci- 
fied by the parameter defining the population size dis- 

3 Ideally, we would like the system to be able to deduce this 
information automatically based on how much time the user 
spent reading the article in ratio to how long the article is. 
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cussed below), then newsgroups are randomly selected 
from this set and assigned to the new agents. The user 
specified keywords are assigned to the keywords field 
of the genotype. 

Each of these newly created agents has identical 
fitness values. Starting out with an initial generation 
of agents consisting of randomly created agents con- 
structed on the basis of the user's input, the system 
evolves several generations of agents (based on user 
feedback) which are gradually more focussed to those 
articles which the user likes. 

3.6 Genetic Operators 

The genetic operators employed to create new 
agents are the crossover and mutation operator. These 
operators are the driving force behind the search pro- 
cess of the genetic algorithm. The user can either 
explicitly indicate which agents in the current popula- 
tion should be used as the basis for mutation or cross- 
over or, alternatively, the system can make this selec- 
tion automatically based on the fitness of the different 
agents (the probability of an agent being selected to 
reproduce being proportional to its fitness). In addi- 
tion to agents with new genotypes, the new generation 
will consist of copies of the most fit agents of the old 
generation. 

The crossover operator exchanges the newsgroup 
fields of two parent agents to create two new offsprings 
i.e. one offspring inherits the newsgroup field from one 
parent, and the other fields from the other parent; and 
vice versa for the other offspring. 

The mutation operator replaces the newsgroup field, 
with another randomly selected newsgroup. This 
newsgroup is selected from the set of newsgroups 
which are "similar" to the one being replaced. The set 
of similar newsgroups is found by looking for shared 
keywords in names of newsgroups. The similarity re- 
quirement is so that an offspring, while being distinct 
from the parent, should not be too different so as to 
take advantage of the traits learned by the parent 4 . 

To be more specific, the genetic operators actually 
refer to the internal representation when creating the 
offspring. This way the offspring does not only in- 
herit genetic information from its parents, but also 
"learned" information. This simulates "cultural learn- 
ing" in the population of agents (or offspring imitat- 
ing the behavior learned by the parents). The way in 
which this is done is that only the attributes (e.g. au- 
thor, keywords, etc) with high weights are inherited 
by the offspring. At the same time as retaining the 
best characteristics of the parent, the offspring is also 
open to newer influences, because the weights of the 
inherited keywords are reset to small positive values 
in the offspring. 

3.7 GA Parameters 

There are some parameters to the genetic algorithm 
such as population size, frequency of crossovers, prob- 
ability of mutation, the number of news articles to be 
resented every day, etc. Some users might have sta- 
le, fixed interests regarding news articles and would 



In future work, we hope to implement other type* of muta- 
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Figure 2: Creating a new news category 
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Figure 3: The news categories 



prefer a low occurrence of mutations. For the moment 
these parameters have to be set by hand (default val- 
ues are provided): 

4 User Interaction 

One of the goals of this project is to make the user 
interaction as easy as possible. The user should be 
able to satisfy her goals with a minimal amount of in- 
teraction. In this section, we present a sample session 
which describes the way a user interacts with this sys- 
tem. This system was implemented in C++. Motif 
and BSD UNIX. 

The user can define any number of news categories. 
A new news category can be created by specifying the 
name of the category and a set of keywords the user 
might be particularly interested in (as shown in Figure 
2). The system then creates the initial population 
of agents for this news category, as described in the 
previous section. 

Let's say the user has defined four categories, 
namely, business, politics, sports and computers. 
These categories are displayed to the user as shown 
in Figure 3 5 The user can click on any of the icons to 
read the articles recommended by the agents in that 
news category. Figure 4 shows the articles selected for 
display by the agents in the Business news category. 
The articles selected by the agents in a population are 
all displayed together. Each of these articles is given 

4 The user can create her own icons. 



348 



a relevance score by the agent that selects it. The rel- 
evance score is displayed alongside the article title (as 
indicated by the number of stars prefixed to the article 
title). The user can see the contents of any of these 
selected articles by double-clicking on the appropriate 
article title. 
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Figure 4: Article headers and body of one selected 
article within one news category. "Thumbs-up" and 
"thumbs-down" icons allow for user feedback. 

The user can give positive or negative feedback by 
clicking on the thumbs-up" or the "thumbs-down 
icon respectively. Positive feedback for an article in- 
creases the fitness of the agent(s) which recommended 
it (and vice versa for negative feedback). The key- 
words, location, author and other information pro- 
vided in the header of the article are incorporated into 
the internal representation of the agent. The user can 
also highlight a segment of text from the article body, 
and give positive feedback so that the selected text 
segment is included in the keywords field of the inter- 
nal representation of the agent. 

The interaction described above is the minimal 
amount of interaction a user needs to engage in to 
use this system. In the background, the system pe- 



riodically creates new generations in which the good 
agents from the previous generation are retained, the 
unfit ones are "retired", and new agents are created 
using genetic operators on the fit agents. By just click- 
ing on "thumbs-up" and "thumbs-down", the user is 
able to control the direction of evolution of popula- 
tions of information filtering agents. 

A more sophisticated user of the system might want 
to be able to exercise greater control over the popu- 
lation of agents. For example, the user can modify 
the survival threshold, the regeneration rate, the pop- 
ulation size and other parameters which control the 
behavior of the population as a whole. This type of 
user might also want to go down to the level of indi- 
vidual agents and manipulate their internal represen- 
tations, namely, the set of keywords, their weights, the 
newsgroup searched, etc. The system allows the user 
to have access at any of these levels 6 and be able to 
modify any component of the system. 

5 Results 

We have performed initial user tests of the system 
described above. Three different users who were not 
involved in the implementation of the system were 
asked to use the personal retrieval system during one 
whole week. They were also asked to use the regular 
USENET navigational interface to retrieve any arti- 
cles they were interested in that were not retrieved 
automatically. All of their actions with both inter- 
faces were recorded. This way, we were able to com- 
pare the set of articles retrieved automatically with 
the "optimal" set of articles (the set of articles that 
should have been retrieved). While a thorough anal- 
ysis still remains to be done, these initial results have 
been encouraging. 

Two main parameters of information retrieval ef- 
fectiveness are recall, defined as the proportion of rel- 
evant articles retrieved, and precision , defined as the 
proportion of retrieved articles that are relevant [11]. 
While these parameters are not enough, in general, to 
completely evaluate the performance of an information 
filtering system, they are useful indicators. 

Figure 5 contains three plots of recall and preci- 
sion values with respect to the number of trials for 
three different users. To measure precision, the num- 
ber of articles that were retrieved, and the number of 
articles read by the user were recorded in a manner 
transparent to the user. Precision was calculated as 
the percentage of retrieved articles that were read by 
the user. To find the articles that are relevant to the 
user, a simple interface to USENET was provided and 
users were asked to browse through the database and 
indicate the articles they would have liked the system 
to retrieve. This information was also recorded. Re- 
call was calculated as the ratio of the retrieved articles 
read by the user over the union of those articles with 
the articles retrieved by hand by the user. 

The graphs demonstrate that the recall as well as 
the precision of the set of articles retrieved improves 



6 A graphical interface for this level of interaction has yet to 
be built. 
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the longer the system is in use. The first user (Fig- 
ure 5a) did not use any genetic operators. There is 
an improvement in precision, however the recall value 
shows minimal changes. As the agents are rewarded 
for getting relevant articles, they get better at elimi- 
nating irrelevant articles. However, since there are no 
new newsgroup introduced through genetic operators, 
the recall rate does not improve much. 

In the case of the second user (Figure 5b), a second 
generation of agents is created after 5 trials by apply- 
ing genetic operators to the successful half of the pop- 
ulation of agents. This adds new newsgroups, which 
helps improve the recall. There is a slight decrease in 
precision, because the offspring has lost some of the 
information learned by the parents during their life- 
time. This decrease is not too significant, because the 
offspring inherits the fittest attributes from the par- 
ents. 

The third user (Figure 5c) applied genetic opera- 
tors more frequently. In some cases, the newly added 
newsgroups cause a decrease in recall as there is an 
inherent element of randomness. However, repeated 
negative feedback decreases the fitness of these unde- 
sirable newsgroups which are then eliminated when 
genetic operators are applied the next time. 

In any automatic information retrieval system there 
is always a tradeoff between precision and recall (when 
both variables already have fairly good values). If 
one improves recall, then typically precision becomes 
worse and vice versa. One of the advantages of a ge- 
netic approach is that the user can dictate his/her own 
preferred trade-off of recall and precision by control- 
ling the frequency with which genetic operators and 
feedback are applied. In further research we hope to 
demonstrate that if the agents are developed properly, 
it is also possible that high values of both precision and 
recall can be achieved simultaneously. 

These results can be better understood with the 
help of a schematic diagram. In Figure 6, the circle 
represents the set of all articles. The region repre- 
senting the set of relevant articles is shaded by verti- 
cal dotted lines. The articles retrieved by the filter- 
ing system are represented by horizontal dotted lines. 
For narrow or focussed filters, the precision is high 
— almost everything retrieved is relevant — but the 
recall is low since very few articles are actually re- 
trieved (Figure 6a}. As the search is broadened, the 
total number of relevant items retrieved goes up, en- 
hancing the recall; at the same time, the number of 
non relevant retrieved items also grows, decreasing the 
precision (Figure 6b). That is, narrow searches pro- 
duce high precision and low recall, whereas broader 
searches produce the reverse result [11]. 

The results obtained in our experiments suggest 
that in using genetic algorithms, learning from feed- 
back helps improve precision. In terms of the user 
profile, this is a specialization of user's interests. 
Initially, the search is too broad; hence, the recall 
is moderately high while the precision is low. How- 
ever, as the agents get feedback for the articles which 
the user thinks are relevant (or irrelevant), it retrieves 
fewer irrelevant articles. Hence the precision keeps 
improving over time without affecting recall too much 
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Figure 6: Recall and precision of the set of articles 
retrieved in the case of a) narrow search and b) broad 
search, c) The effects of learning on the set of articles 
retrieved : specialization, d) The effects of genetic 
operators on the set of articles improved : exploration. 



(Figure 6c). 

Genetic operators on the other hand are respon- 
sible for increasing the recall without sacrificing too 
much precision (Figure 6d). This corresponds to ex- 
ploration of areas which may be of potential inter- 
est to the user. Mutation introduces a random news- 
group that had not been considered before and which 
the user might find relevant. This helps to retrieve 
proportionally more relevant articles, and thereby in- 
creases recall. At the same time, the mutated offspring 
is quite similar to the parent — the inherited precision 
not much worse than that of the parent genotype since 
the weakest gene was mutated. In case of crossover, 
the offspring retains the best features of the parents, 
thereby retaining most of the precision learned by the 
parents. At the same time, it also introduces newer 
kinds of articles which the user might possibly like, so 
as to help the recall. 

In all of the cases studied, the users experienced a 
reduction in time and effort it takes to read news on a 
daily basis. We will have to test the system for longer 
periods of time to find out whether after a while the 
precision and recall rate approach numbers that make 
it acceptable to have a purely automated system (as 
opposed to a combination of manual and automated 
selection). The system would have been much more 
efficient if the various news databases had provisions 
for more feature descriptions of articles. In some news- 
groups this is already the case. For example, the ar- 
ticle header in Figure 1 contains various features such 




as keywords, location, category and priority. We ex- 
pect the system to improve a lot once such additional 
features are taken into account. 

6 Conclusion and Future Directions 

The paper demonstrated that techniques from Ar- 
tificial Life, in particular a combination of a Genetic 
Algorithm with Learning from Feedback, can be used 
to evolve a personalized system for automatic infor- 
mation filtering. Because of its dynamic nature, this 
system is able to adapt to the changing interests of 
the user. 

We discussed a first prototype which assists the user 
in retrieving USENET Netnews articles. Results ob- 
tained in experiments with this system indicate that 
the genetic algorithm is responsible for improving the 
recall rate of the articles retrieved, while the learning 
mechanism is responsible for improving the precision 
rate. 

While the first prototype produced some promising 
results, a lot of future research needs to be performed. 
The internal representation of our retrieving agents 
can be much improved. We intend to research more so- 
phisticated representations which can represent more 
complicated user interests. We further intend to elab- 
orate the graphical aspects of the user interface so as 
to present the user with an animated, graphical world 
of information agents. Eventually, we plan to hand 
the system to users for longer periods of time so as to 
thoroughly evaluate the premises of the project. 
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implements a form of cultural co-evolution for synergistic multistrategy machine learning. A collection of diverse 
learning methods embodied as agents attempting to solve a particular problem evolve parameter settings via a 
genetic algorithm. The agents also generate partial solutions which compete with each other to be used by the 
learners, and in the process change the genetic fitness landscape of the learners. Patent pending, and licensed by 
several major US corporations. 

Audio Knowledge Acquisition Tool, with Chuck McMath. A Macintosh application for the management of large 
amounts of audio protocol data. Distributed by the US National Technical Information Service; in use by 
knowledge engineers, psychologists, anthropologists and oral historians. 

Amino Acid Representation Package. Common Lisp code for implementing a wide variety of representations for 
amino acids, including the novel Atoms-Orbitals-Hydrogens (AOH) representation. Used by machine learning 
researchers for protein structure prediction and other tasks. 

AI & Molecular Biology Researchers Database. Database of names, contact information and research interests of 
more than 150 researchers worldwide. In 1995, the second most frequently accessed file in the European 
Molecular Biology Laboratory WAIS-server, widely used by students, academics and commercial organizations. 



No longer maintained. 



Prior and Active Research Support 

NIH/CC Research Contract (Lawrence Hunter, Principle Investigator), 7/1/00-6/30/01, $100,000 
Gene expression array analysis for critical care medicine studies. 20% effort 

Performed gene expression array analysis and developed novel methods for the interpretation of data generated by 
NIH Clinical Center investigators in studies of sepsis and multiple organ failure. 

1U01 AA13524-02 (Lawrence Hunter, Principle Investigator) NIH / NI AAA 9/1/01-8/31/06 $500,000 
Neuroinformatics Core Facility for the Integrated Neuroscience Initiative on Alcoholism: 20% effort 
The goal of this project is to develop a bioinformatics resource for a research consortium on alcoholism and 
neuroscience. The specific aims are: (i) integration of multiresolution neuroscience data, (ii) development of novel 
data mining tools to generate hypotheses on neuroadaptation to alcohol, and (iii) design and development of a 
web-based integrated computational analysis workbench for consortium investigators 

Genetics Institute/Wyeth-Ayerst (Lawrence Hunter, Principle Investigator) 9/01/01-8/31/03, $1 13,650 
Development of Biological Literature Text Mining Software (0% effort) 

The purpose of this collaboration between the Expression Profiling Informatics ("EPI") group at Wyeth-Ayerst 
Research, and Professor Larry Hunter, Director of the Center for Computational Pharmacology at the University of 
Colorado School of Medicine, is to develop tools and software for automated literature mining. This support 
Funds a computational linguist research associate and related expenses. 

1 R24 AA13162-01 (Boris Tabakoff, Principal Investigator) NIH / NIAAA 4/1/01-3/30/06, $999,562 
Gene Expression Array Technology Center for Alcohol Research, 13.33% effort 

The aim of this proposal is to establish a gene array technology core facility to serve as a national resource for 
alcohol research. The bioinformatics group will collaborate with NIAAA investigators in the analysis of 
expression array data and to develop a highly integrated database that includes gene expression profile data as well 
as genetic sequence and other data relevant to ethanol induced changes and ethanol susceptibility. 

1M01 RR00051 (Robert Eckel, Principle Investigator) NIH / NCRR 4/01/02-3/31/07 $6,951,425 
University of Colorado General Clinical Research Center 14.69% effort 

The University of Colorado General Clinical Research Center has implemented an gene expression array facility 
for its users, and Dr. Hunter advises the bioinformatics director and his staff on appropriate analysis techniques for 
this novel and complex class of data. 

5 P30 CA46934-15 (Paul Bunn, Principle Investigator) NIH / NCI 3/01/88-1/3 1/06 
Cancer Center Support Grant, 23.51% effort 

The University of Colorado Comprehensive Cancer Center (UCCC) is the only NCI- designed comprehensive 
Cancer Center in the Rocky Mountain region. Dr. Hunter is a member of the Biostatistics Core, and contributes to 
the design and analysis of gene expression array experiments and other bioinformatics issues that arise at the 
Center. 

P01 HL68743 (Edward Abraham, Principle Investigator) NIH / NHLBI 9/01/02-8/31/07 $139,171 

Heterogeneous neutrophil responses in acute lung injury, 10% effort 
The overall hypothesis is that neutrophils produce heterogeneous responses to inflammatory stimuli. The 
Molecular Biology Core will perform microarray expression analysis on normal peripheral and BAL neutrophils, 
stimulated neutrophils and neutrophils from patients with acute lung injury. Dr. Hunter participates in gene 
expression array analysis for the Core. 

P01 HL67671-01 (Robert Mason, Principle Investigator) NIH / NHLBI 7/01/01-6/30/04 



SCOR: Pathobiology of Fibrotic Lung Disease, 10% effort 

The overall purpose of this SCOR proposal is to investigate the role of the myofibroblast in idiopathic pulmonary 
fibrosis (IPF). Five projects investigate the source and regulation of TGF-beta production, especially the 
contribution of the ingestion of apoptotic cells and cell debris, the relationship of paracrine factors and mechanical 
factors on myofibroblast gene regulation, the role of survival factors for myofibroblasts such as IGF- 1 and 
myofibroblast apoptosis, interactions of myofibroblasts with alveolar epithelial cells, and finally regulation by 
interferon gamma (INF). Dr. Hunter performs informatics duties in the gene expression array core of the project. 

Cystic Fibrosis Foundation, (David Rodman, Principle Investigator) 4/01/01-3/30/03, $500,000 
Effects of Psuedomonas aeruginosa on Inflammatory Gene Expression, 3.48% effort 

The aim of this proposal is to test the hypotheses that (1) Psuedomonas aeruginosa interacts with human airway 
epithelial cells and neutrophils to activate a pro-inflammatory patter of gene expression, (2) activation is more 
prominent in CF than non-CF epithelium and (3) specific gene products of P. aeruginosa can be identified as 
contributing to this aspect of bacterial virulence. The general experimental approach uses gene arrays, gene traps 
and proteomics. Dr. Hunter directs a bioinformatics group which will perform analyses of the array data. 

ROl HL ???? (Mark Geraci, Principle Investigator) NIH / NHLBI, 10/01/02-9/31/05, $500,000 
Application of expression analysis to study disease pathogenesis. 10% effort 

Create a shared microarray facility to support to NHLBI researchers for the incorporation of both cDNA and 
Affymetrix expression arrays into their research endeavors. Specific aims are to perform developmental projects 
for maximizing RNA amplification techniques and utilizing reference standards and strategies to develop 
algorithms for direct comparison of data from cDNA arrays and Affymetrix arrays; and to develop and implement 
novel bioinformatic approaches to expression data analysis, including "scripted" internet-based analysis for 
NHLBI researchers. Dr. Hunter directs the bioinformatics effort, 

1 ROl DE 15191-01 (Richard Spritz, Principle Investigator) NIH / NIDCR, 2/01/03-1/31/07, $250,000 
Gene Discovery for Craniofacial Disorders 5% effort 

The goal of the proposal is to identify the genes, pathways, and genetic networks that are involved in craniofacial 
development and thus represent targets for genetic and non-genetic determinants of non-syndromic cleft lip and/or 
palate. We plan a careful microarray study of gene expression profiles in the developing face of the mouse. Dr. 
Hunter will apply state of the art bioinformatics tools to analyze and interpret the data. 

Pending Research Support 

NIH IR01 LM0081 11-01 (Lawrence Hunter, Principle Investigator), 12/1/03-11/30/06 $499,000 
Technology Development for a Molecular Biology Knowledge-base 1 5% effort 

The goal of this proposal is to demonstrate that database integration and natural language information extraction 
technology are adequate to produce in automated fashion a broad, deep knowledge-base of molecular biology. 

1R37 HD19547-19 (Margaret Neville, Principle Investigator) 7/1/03-6/30/08, $250,000 
Physiological factors affecting Human Lactation, 5% effort 

Renewal of Dr. Neville's grant for studies of milk secretion and its regulation. Dr. Hunter would be added to 
oversee bioinformatic analysis of gene expression array studies. 

Selected Lectures and Presentations: 

The Era of Biognostic Machines, keynote address to Association for Computing Machinery Special Interest Group 
on Applied Computing (ACM-SAC) conference., 2003 

Proteomic Bioinformatics, Center for Computational Pharmacology mini-symposium, 2003 



Biognostic Machines for Cognitive Disability, invited address, Coleman Institute annual meeting, 2002 
Bioinformatics and Human Health, UCHSC Chancellor's Luncheon Address, 2002 

Data Mining for High Throughput Biomedicine, keynote address to the Research Society on Alcoholism conference, 
Denver, Colorado, June 2000 

Edgar: Extraction of Drugs, Genes and Relations from the Biomedical Literature, Pacific Symposium on 
Biocomputing, January, 2000 

The Role of Machine Learning and Natural Language Processing in Contemporary Drug Discovery, Pharmacology 
Grand Rounds, University of Colorado School of Medicine, October, 1999 

Inductive Modeling: Power and Pitfalls, keynote address to MODEL-IT conference, Waginengen, the Netherlands, 
November 1998 

Coevolution of Symbol Systems and Behavior, lecture and workshop, Simulations of Adaptive Behavior conference, 
Zurich, Switzerland, August 1998. 

Machine Learning for Drug Discovery, invited address, SmithKline Beecham Data Mining Days, November 1997. 

Computer Science : Biology :: Mathematics : Physics, MIT Media lab, April 1997 

The Role of Computation in Cognitive Science, Krasnow Institute for Advanced Study of Cognition Seminar Series, 
November, 1996. 

Coevolution Learning: Syngerstic Evolution of Learning Agents and Problem Representations, Multistrategy 
Learning Workshop, June, 1996. 

AI Models for Biology, and Biological Models for AI, Keynote address, Second International Conference on 
Intelligent Systems for Molecular Biology, July 1995. 

Computers, Modelling , and Theorretical Biology, Invited address to the Keystone Center Scientist to Scientist 
Colloquium, August, 1994 

The National Library of Medicine on the Internet: A Digital Library for Biomedicine. Invited address to the 
Computers and Chemistry Division of the American Chemical Society conference, Aug 1994 

Planning to Discover in Molecular Biology, MIT AI Lab Revolving Seminar Series, April 1 994 

Molecular Biology for the Computer Scientist, Full day tutorial at the Hawaiian International Conference on System 
Sciences, January 1993. Repeated Jan 1994. 

AI & Molecular Biology, Plenary address, National Conference on Artificial Intelligence, San Jose, CA, July 1992. 

Megaclustering of Unsegmented Datastreams and Applications to Molecular Biology, Johns Hopkins Applied 
Physics Laboratory distinguished lecture series, October 1992. 

Electronic Facilitation of Scientific Communication, Panel organizer and speaker, International Conference on the 
Biomatrix, George Mason University, July 1990 

Knowledge Acquisition Planning for Inference from Large Datasets y Keynote address, 1990 Conference on AI 
Systems in Government, Washington, DC, May 1990 

Machine Learning: Ready for Industrial Application, Invited address to Third Annual Artificial Intelligence Forum, 
Sanibel Island, FL, February, 1989 

Artificial Neural Networks as Theories of Mind. International Neural Network Society, Boston MA, September, 
1988 

Machine Learning for Molecular Biology. Invited address to the Theoretical Biology and Biophysics Group, Los 
Alamos National Laboratory, June 1988 



Indexing and Recognition, AI/BioMed: The First International Conference on Artificial Intelligence and its Impacts 
in Biology and Medicine, Montpellier, France, September 1986 

Computers and Privacy. Guest lecture in Constitutional Law, University of Connecticut at Hartford Law School, 
Dec, 1985. 



